Statistical Voice Conversion with WaveNet-Based Waveform Generation

نویسندگان

  • Kazuhiro Kobayashi
  • Tomoki Hayashi
  • Akira Tamamori
  • Tomoki Toda
چکیده

This paper presents a statistical voice conversion (VC) technique with the WaveNet-based waveform generation. VC based on a Gaussian mixture model (GMM) makes it possible to convert the speaker identity of a source speaker into that of a target speaker. However, in the conventional vocoding process, various factors such as F0 extraction errors, parameterization errors and over-smoothing effects of converted feature trajectory cause the modeling errors of the speech waveform, which usually bring about sound quality degradation of the converted voice. To address this issue, we apply a direct waveform generation technique based on a WaveNet vocoder to VC. In the proposed method, first, the acoustic features of the source speaker are converted into those of the target speaker based on the GMM. Then, the waveform samples of the converted voice are generated based on the WaveNet vocoder conditioned on the converted acoustic features. In this paper, to investigate the modeling accuracies of the converted speech waveform, we compare several types of the acoustic features for training and synthesizing based on the WaveNet vocoder. The experimental results confirmed that the proposed VC technique achieves higher conversion accuracy on speaker individuality with comparable sound quality compared to the conventional VC technique.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Islanding Detection Method of Distributed Generation Based on Wavenet

Due to the increasing need to distributed energy resources in power systems, their problems should be studied. One of the main problem of distributed energy resources is unplanned islanding. The unplanned islanding has some dangers to the power systems and the repairman which are works with the incorrect devices. In this paper, a passive local method is proposed. The proposed method is based on...

متن کامل

Statistical singing voice conversion based on direct waveform modification with global variance

This paper presents techniques to improve the quality of voices generated through statistical singing voice conversion with direct waveform modification based on spectrum differential (DIFFSVC). The DIFFSVC method makes it possible to convert singing voice characteristics of a source singer into those of a target singer without using vocoder-based waveform generation. However, quality of the co...

متن کامل

Statistical singing voice conversion with direct waveform modification based on the spectrum differential

This paper presents a novel statistical singing voice conversion (SVC) technique with direct waveform modification based on the spectrum differential that can convert voice timbre of a source singer into that of a target singer without using a vocoder to generate converted singing voice waveforms. SVC makes it possible to convert singing voice characteristics of an arbitrary source singer into ...

متن کامل

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...

متن کامل

The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016

This paper presents the NU-NAIST voice conversion (VC) system for the Voice Conversion Challenge 2016 (VCC 2016) developed by a joint team of Nagoya University and Nara Institute of Science and Technology. Statistical VC based on a Gaussian mixture model makes it possible to convert speaker identity of a source speaker’ voice into that of a target speaker by converting several speech parameters...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017